Skip to content

Fix/issue 3311#3313

Open
aartisonigra wants to merge 5 commits into
redis:masterfrom
aartisonigra:fix/issue-3311
Open

Fix/issue 3311#3313
aartisonigra wants to merge 5 commits into
redis:masterfrom
aartisonigra:fix/issue-3311

Conversation

@aartisonigra

@aartisonigra aartisonigra commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Fixes #3311

This change resolves a sharded Pub/Sub resubscription issue during in-place slot migrations in Redis Cluster.

Changes
Align sharded Pub/Sub migration event handling.
Ensure resubscribe and rediscover logic is triggered when a sharded channel is moved.
Fix typo in the emitted error event name:
sharded-shannel-moved-error → sharded-channel-moved-error


Note

Medium Risk
Touches cluster sharded Pub/Sub migration and topology rediscovery; behavior change is targeted but affects live slot moves where subscriptions must survive.

Overview
Fixes #3311: sharded Pub/Sub subscriptions could be silently dropped when a slot moved while the client was already subscribed.

The cluster sharded Pub/Sub connection now listens for sharded-channel-moved (emitted from the node client when the server pushes SUNSUBSCRIBE) instead of the non-existent server-sunsubscribe handler. On that event it rediscovers topology, opens the sharded client on the new slot owner, and reattaches listeners via extendPubSubChannelListeners. Failure paths now emit sharded-channel-moved-error (typo fix from sharded-shannel-moved-error).

Adds an integration test that subscribes before migrating the channel’s slot in place, then publishes until the listener receives a message again.

Reviewed by Cursor Bugbot for commit 2b01ecf. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit f9228b9. Configure here.

Comment thread packages/client/lib/client/index.ts
@nkaradzhov

Copy link
Copy Markdown
Collaborator

@aartisonigra thanks, i will have a look!

@nkaradzhov nkaradzhov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking down #3311 — the diagnosis is right (the cluster resubscribe never fires on in-place slot migration), but I'd like to land it differently.

  1. server-sunsubscribe is a stale event name, the right one is sharded-channel-moved

So instead of emitting a second event from the queue, we should just point the cluster listener at the event we already emit. That keeps a single, documented event and avoids touching the queue at all.

  1. The index.ts change breaks constructor arity. The queue needs no change at all, so please revert the index.ts edit entirely.

  2. Please drop the FT.SEARCH changes.

  3. Regression test. The existing tests never exercise the resubscribe path. Here's one that subscribes first, then migrates in place — it fails on the dead listener and passes with the fix:

    // Regression for #3311: subscribe FIRST, then migrate the slot in place.
    // The server pushes SUNSUBSCRIBE to the already-subscribed client, which
    // must drive the cluster to rediscover and reattach the listener on the
    // new owner. The dead `server-sunsubscribe` listener meant this never
    // happened and the subscription was silently lost after migration.
    testUtils.testWithCluster('should resubscribe a sharded channel after in-place slot migration (#3311)', async cluster => {
      const SLOT = 10328, // slot of `channel`
        migrating = cluster.slots[SLOT].master,
        importing = cluster.masters.find(master => master !== migrating)!,
        [migratingClient, importingClient] = await Promise.all([
          cluster.nodeClient(migrating),
          cluster.nodeClient(importing)
        ]);

      const listener = spy();

      // subscribe BEFORE migration -> the sharded PubSub client attaches to `migrating`
      await cluster.sSubscribe('channel', listener);

      // move the slot in-place to `importing`; `migrating` loses the slot and
      // pushes SUNSUBSCRIBE to the subscribed client
      await Promise.all([
        migratingClient.clusterDelSlots(SLOT),
        importingClient.clusterDelSlots(SLOT),
        importingClient.clusterAddSlots(SLOT),
        migratingClient.clusterSetSlot(SLOT, 'NODE', importing.id),
        importingClient.clusterSetSlot(SLOT, 'NODE', importing.id)
      ]);

      // the reattach is async and sharded PubSub does not buffer, so
      // publish until the resubscribed listener receives the message.
      // With the bug this never reattaches and the loop exhausts -> assertion fails.
      for (let i = 0; i < 50 && !listener.called; i++) {
        await cluster.sPublish('channel', 'message');
        await setTimeout(100);
      }

      assert.ok(listener.calledWithExactly('message', 'channel'));
    }, {
      serverArguments: [],
      minimumDockerVersion: [7]
    });

Requires import { setTimeout } from 'node:timers/promises'; at the top of the spec.

Net change: ~2 lines in cluster-slots.ts + the test. No queue/index.ts churn, no RediSearch change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the search part of this PR?

Comment thread packages/client/lib/client/index.ts
@aartisonigra aartisonigra requested a review from nkaradzhov June 23, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants